.Net Knowledge Nuggets: A Note About Unicode

A while back, one of my coworkers sent out a late-night plea for help with a unicode issue he was having. I thought I'd post the conversation here (slightly censored) for future reference:

Original email from coworker:

I’m looking for a little .NET help – particularly with the MailMessage class.    I’m pulling the contents of an HTML page which is in French (and displaying properly in the webpage) and sending it via an email.   I’m having difficulty getting the mail to use the correct encoding to show all the special characters correctly .   If anyone has any experience doing this –please let me know.

My reply:

Make sure you're setting the .BodyEncoding property to Unicode. If you're getting "?" chars where special chars should be, it may actually be a problem with the way you're importing the content (the HTML could be getting munged before going into the MailMessage body). Take a look here: http://bytes.com/topic/asp-net/answers/345431-sending-mail-message-unicode-text

Coworker initial reply:

Thanks – I’m going to take a crack at this tomorrow.   I’ve tried all the different encodings on the .BodyEncoding property but no luck.   So I think you are right – it might be the way I am pulling the HTML.       What a freaking pain.

And his follow-up:

You were right.   That worked.  I owe you a beer – remind me to buy it for you next happy hour.

That reminds me -- I need to collect on that beer.

Remember that if you don't handle Unicode correctly in every place you manipulate the bytes (from initial read to final write), you run the chance of munging the bits because some string or character library is assuming 8-bit chars or assumes the wrong encoding.  The key here was to ensure the initial read was reading it as Unicode -- something like this:

var myReader = new StreamReader(fileName, System.Text.Encoding.Unicode);